-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: perform reductions block-wise #29847
Conversation
pandas/core/internals/managers.py
Outdated
for blk in self.blocks: | ||
bres = func(blk.values, *args, **kwargs) | ||
if np.ndim(bres) == 0 and blk.shape[0] != 1: | ||
# i.e. we reduced over all axes and not just one; re-do column-wise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this case. How do we get here? re-calling func
doesn't seem ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC it was when we have axis=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some digging, this appears to be coming from nanops funcs that are either not getting axis passed or are not handling it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to make this check unnecessary
Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-12-27 20:20:58 UTC |
Consolidated the conditions under which we go through the block-wise path. In a follow-up, this will allow for significant cleanup of the non-blockwise path (i.e. existing _reduce code) because the set of cases it has to handle is cut down. |
i hypothesize that after this and #30416 we'll be able to get rid of quite a bit of the existing special case handling in DataFrame._reduce. |
docbuild failure looks unrelated |
@jreback @TomAugspurger ideally id like to get this and #29941 in before the RC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look nice. Any user-facing changes that need a whatsnew?
Do you think we have sufficient test coverage here?
# After possibly _get_data and transposing, we are now in the | ||
# simple case where we can use BlockManager._reduce | ||
res = df._data.reduce(op, axis=1, skipna=skipna, **kwds) | ||
assert isinstance(res, dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning to keep these asserts in?
None that I've identified.
It seems pretty solid.
Fine by me either way. |
ok by me (assuming you are going to back and de-gross / consolidate this code at some point). |
The code added here is is pretty de-grossed. Some of the assertions could be removed to make it less verbose, but unless we get 2D EAs (which is a windmill ive mostly stopped tilting at), this is about as good as it gets. That said, inside DataFrame._reduce below the code introduced here is a bunch of really gross code that we can clean up after this. |
k thanks |
No description provided.